Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells443432
Missing cells (%)8.3%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 91 (20.4%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 350 (78.5%) missing values Cabin has 342 (76.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 308 (69.1%) zeros SibSp has 308 (69.1%) zeros Zeros
Parch has 340 (76.2%) zeros Parch has 343 (76.9%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 11 (2.5%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2025-03-05 15:19:05.6690292025-03-05 15:19:08.170735
Analysis finished2025-03-05 15:19:08.1675222025-03-05 15:19:10.646825
Duration2.5 seconds2.48 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean440.91704443.95067
 Dataset ADataset B
Minimum51
Maximum890890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:10.879027image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum51
5-th percentile45.546.5
Q1207.5205.5
median432443.5
Q3678.75678.75
95-th percentile854.75845.75
Maximum890890
Range885889
Interquartile range (IQR)471.25473.25

Descriptive statistics

 Dataset ADataset B
Standard deviation266.47689261.86245
Coefficient of variation (CV)0.604369680.58984583
Kurtosis-1.277178-1.2697806
Mean440.91704443.95067
Median Absolute Deviation (MAD)236.5236.5
Skewness0.062629590.0095143986
Sum196649198002
Variance71009.93268571.944
MonotonicityNot monotonicNot monotonic
2025-03-05T15:19:11.043839image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
737 1
 
0.2%
418 1
 
0.2%
875 1
 
0.2%
820 1
 
0.2%
151 1
 
0.2%
389 1
 
0.2%
889 1
 
0.2%
623 1
 
0.2%
311 1
 
0.2%
203 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
462 1
 
0.2%
875 1
 
0.2%
9 1
 
0.2%
693 1
 
0.2%
96 1
 
0.2%
755 1
 
0.2%
40 1
 
0.2%
27 1
 
0.2%
225 1
 
0.2%
423 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
8 1
0.2%
9 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
8 1
0.2%
9 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
5 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
273 
1
173 
0
278 
1
168 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row11
3rd row01
4th row00
5th row01

Common Values

ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Length

2025-03-05T15:19:11.158860image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-05T15:19:11.210505image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:11.246222image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring characters

ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 278
62.3%
1 168
37.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
251 
1
105 
2
90 
3
233 
1
114 
2
99 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row22
2nd row23
3rd row33
4th row23
5th row32

Common Values

ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Length

2025-03-05T15:19:11.307608image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-05T15:19:11.362495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:11.410880image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Most occurring characters

ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 105
23.5%
2 90
 
20.2%
ValueCountFrequency (%)
3 233
52.2%
1 114
25.6%
2 99
22.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:11.689965image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6582
Median length4847
Mean length26.00896927.116592
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1160012094
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowSilven, Miss. Lyyli KaroliinaAbelson, Mrs. Samuel (Hannah Wizosky)
2nd rowAbelson, Mrs. Samuel (Hannah Wizosky)Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
3rd rowSkoog, Master. Karl ThorstenLam, Mr. Ali
4th rowBateman, Rev. Robert JamesShorney, Mr. Charles Joseph
5th rowSadlier, Mr. MatthewHerman, Mrs. Samuel (Jane Laver)
ValueCountFrequency (%)
mr 258
 
14.6%
miss 102
 
5.8%
mrs 61
 
3.5%
william 29
 
1.6%
henry 18
 
1.0%
john 18
 
1.0%
master 17
 
1.0%
george 14
 
0.8%
mary 12
 
0.7%
james 12
 
0.7%
Other values (874) 1224
69.3%
ValueCountFrequency (%)
mr 266
 
14.6%
miss 90
 
5.0%
mrs 63
 
3.5%
william 34
 
1.9%
john 24
 
1.3%
henry 16
 
0.9%
master 16
 
0.9%
charles 15
 
0.8%
thomas 14
 
0.8%
george 14
 
0.8%
Other values (902) 1265
69.6%
2025-03-05T15:19:12.162785image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1319
 
11.4%
r 941
 
8.1%
e 828
 
7.1%
a 798
 
6.9%
i 655
 
5.6%
s 636
 
5.5%
n 635
 
5.5%
M 561
 
4.8%
l 502
 
4.3%
o 467
 
4.0%
Other values (50) 4258
36.7%
ValueCountFrequency (%)
1373
 
11.4%
r 1005
 
8.3%
e 866
 
7.2%
a 832
 
6.9%
i 672
 
5.6%
n 656
 
5.4%
s 641
 
5.3%
M 553
 
4.6%
l 541
 
4.5%
o 523
 
4.3%
Other values (49) 4432
36.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11600
100.0%
ValueCountFrequency (%)
(unknown) 12094
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1319
 
11.4%
r 941
 
8.1%
e 828
 
7.1%
a 798
 
6.9%
i 655
 
5.6%
s 636
 
5.5%
n 635
 
5.5%
M 561
 
4.8%
l 502
 
4.3%
o 467
 
4.0%
Other values (50) 4258
36.7%
ValueCountFrequency (%)
1373
 
11.4%
r 1005
 
8.3%
e 866
 
7.2%
a 832
 
6.9%
i 672
 
5.6%
n 656
 
5.4%
s 641
 
5.3%
M 553
 
4.6%
l 541
 
4.5%
o 523
 
4.3%
Other values (49) 4432
36.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11600
100.0%
ValueCountFrequency (%)
(unknown) 12094
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1319
 
11.4%
r 941
 
8.1%
e 828
 
7.1%
a 798
 
6.9%
i 655
 
5.6%
s 636
 
5.5%
n 635
 
5.5%
M 561
 
4.8%
l 502
 
4.3%
o 467
 
4.0%
Other values (50) 4258
36.7%
ValueCountFrequency (%)
1373
 
11.4%
r 1005
 
8.3%
e 866
 
7.2%
a 832
 
6.9%
i 672
 
5.6%
n 656
 
5.4%
s 641
 
5.3%
M 553
 
4.6%
l 541
 
4.5%
o 523
 
4.3%
Other values (49) 4432
36.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11600
100.0%
ValueCountFrequency (%)
(unknown) 12094
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1319
 
11.4%
r 941
 
8.1%
e 828
 
7.1%
a 798
 
6.9%
i 655
 
5.6%
s 636
 
5.5%
n 635
 
5.5%
M 561
 
4.8%
l 502
 
4.3%
o 467
 
4.0%
Other values (50) 4258
36.7%
ValueCountFrequency (%)
1373
 
11.4%
r 1005
 
8.3%
e 866
 
7.2%
a 832
 
6.9%
i 672
 
5.6%
n 656
 
5.4%
s 641
 
5.3%
M 553
 
4.6%
l 541
 
4.5%
o 523
 
4.3%
Other values (49) 4432
36.6%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
281 
female
165 
male
289 
female
157 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.73991034.7040359
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21142098
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalefemale
2nd rowfemalefemale
3rd rowmalemale
4th rowmalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 281
63.0%
female 165
37.0%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Length

2025-03-05T15:19:12.261257image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-05T15:19:12.320095image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:12.357176image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 281
63.0%
female 165
37.0%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Most occurring characters

ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2114
100.0%
ValueCountFrequency (%)
(unknown) 2098
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 611
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 165
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7875
Distinct (%)22.0%21.0%
Missing9189
Missing (%)20.4%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.96174630.252577
 Dataset ADataset B
Minimum0.420.67
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:12.467998image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.67
5-th percentile56.8
Q12021
median2828.5
Q33939
95-th percentile5856.2
Maximum8071
Range79.5870.33
Interquartile range (IQR)1918

Descriptive statistics

 Dataset ADataset B
Standard deviation14.98329814.189995
Coefficient of variation (CV)0.500080920.46905079
Kurtosis0.279593680.10180537
Mean29.96174630.252577
Median Absolute Deviation (MAD)98.5
Skewness0.517846240.43054085
Sum10636.4210800.17
Variance224.49921201.35597
MonotonicityNot monotonicNot monotonic
2025-03-05T15:19:12.635554image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 17
 
3.8%
30 16
 
3.6%
22 15
 
3.4%
25 13
 
2.9%
18 13
 
2.9%
24 13
 
2.9%
28 12
 
2.7%
32 11
 
2.5%
21 11
 
2.5%
36 10
 
2.2%
Other values (68) 224
50.2%
(Missing) 91
20.4%
ValueCountFrequency (%)
24 18
 
4.0%
30 14
 
3.1%
28 14
 
3.1%
18 14
 
3.1%
22 12
 
2.7%
29 12
 
2.7%
25 11
 
2.5%
19 11
 
2.5%
34 11
 
2.5%
26 10
 
2.2%
Other values (65) 230
51.6%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 4
0.9%
2 2
 
0.4%
3 1
 
0.2%
4 5
1.1%
5 4
0.9%
6 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 1
 
0.2%
4 5
1.1%
5 3
0.7%
6 2
 
0.4%
7 3
0.7%
8 1
 
0.2%
9 3
0.7%
ValueCountFrequency (%)
0.67 1
 
0.2%
1 2
 
0.4%
2 4
0.9%
3 1
 
0.2%
4 5
1.1%
5 3
0.7%
6 2
 
0.4%
7 3
0.7%
8 1
 
0.2%
9 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 4
0.9%
2 2
 
0.4%
3 1
 
0.2%
4 5
1.1%
5 4
0.9%
6 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.484304930.51793722
 Dataset ADataset B
Minimum00
Maximum88
Zeros308308
Zeros (%)69.1%69.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:12.860734image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.00043831.0968356
Coefficient of variation (CV)2.06571972.1176999
Kurtosis17.61026116.621726
Mean0.484304930.51793722
Median Absolute Deviation (MAD)00
Skewness3.56688173.5670788
Sum216231
Variance1.00087671.2030483
MonotonicityNot monotonicNot monotonic
2025-03-05T15:19:12.936769image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 308
69.1%
1 104
 
23.3%
2 12
 
2.7%
3 11
 
2.5%
4 6
 
1.3%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 14
 
3.1%
4 10
 
2.2%
3 8
 
1.8%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 104
 
23.3%
2 12
 
2.7%
3 11
 
2.5%
4 6
 
1.3%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 14
 
3.1%
3 8
 
1.8%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 100
 
22.4%
2 14
 
3.1%
3 8
 
1.8%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 308
69.1%
1 104
 
23.3%
2 12
 
2.7%
3 11
 
2.5%
4 6
 
1.3%
5 3
 
0.7%
8 2
 
0.4%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.381165920.36995516
 Dataset ADataset B
Minimum00
Maximum66
Zeros340343
Zeros (%)76.2%76.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:13.010311image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.828158640.79599757
Coefficient of variation (CV)2.17269852.1516056
Kurtosis11.84239810.474245
Mean0.381165920.36995516
Median Absolute Deviation (MAD)00
Skewness3.00571242.8083941
Sum170165
Variance0.685846730.63361213
MonotonicityNot monotonicNot monotonic
2025-03-05T15:19:13.089788image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 37
 
8.3%
5 3
 
0.7%
4 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 56
 
12.6%
2 40
 
9.0%
4 3
 
0.7%
3 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 37
 
8.3%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 56
 
12.6%
2 40
 
9.0%
3 2
 
0.4%
4 3
 
0.7%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 56
 
12.6%
2 40
 
9.0%
3 2
 
0.4%
4 3
 
0.7%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 37
 
8.3%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct380382
Distinct (%)85.2%85.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:13.502299image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.76681616.7713004
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters30183020
Distinct characters3532
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique328336 ?
Unique (%)73.5%75.3%

Sample

 Dataset ADataset B
1st row250652P/PP 3381
2nd rowP/PP 3381347742
3rd row3470881601
4th rowS.O.P. 1166374910
5th row367655220845
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 12
 
2.1%
a/5 7
 
1.2%
ca 6
 
1.1%
soton/oq 6
 
1.1%
ston/o 6
 
1.1%
2 6
 
1.1%
w./c 6
 
1.1%
c 5
 
0.9%
soton/o.q 5
 
0.9%
Other values (401) 481
84.8%
ValueCountFrequency (%)
pc 32
 
5.7%
c.a 11
 
2.0%
ca 7
 
1.2%
2 6
 
1.1%
sc/paris 6
 
1.1%
a/5 6
 
1.1%
ston/o 6
 
1.1%
ston/o2 4
 
0.7%
soton/oq 4
 
0.7%
19950 4
 
0.7%
Other values (402) 477
84.7%
2025-03-05T15:19:14.027107image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 370
12.3%
1 355
11.8%
2 302
10.0%
7 242
 
8.0%
4 239
 
7.9%
6 201
 
6.7%
5 197
 
6.5%
0 185
 
6.1%
9 154
 
5.1%
8 149
 
4.9%
Other values (25) 624
20.7%
ValueCountFrequency (%)
3 368
12.2%
1 349
11.6%
2 311
10.3%
7 239
 
7.9%
4 235
 
7.8%
6 208
 
6.9%
0 207
 
6.9%
5 194
 
6.4%
9 171
 
5.7%
8 133
 
4.4%
Other values (22) 605
20.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3018
100.0%
ValueCountFrequency (%)
(unknown) 3020
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 355
11.8%
2 302
10.0%
7 242
 
8.0%
4 239
 
7.9%
6 201
 
6.7%
5 197
 
6.5%
0 185
 
6.1%
9 154
 
5.1%
8 149
 
4.9%
Other values (25) 624
20.7%
ValueCountFrequency (%)
3 368
12.2%
1 349
11.6%
2 311
10.3%
7 239
 
7.9%
4 235
 
7.8%
6 208
 
6.9%
0 207
 
6.9%
5 194
 
6.4%
9 171
 
5.7%
8 133
 
4.4%
Other values (22) 605
20.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3018
100.0%
ValueCountFrequency (%)
(unknown) 3020
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 355
11.8%
2 302
10.0%
7 242
 
8.0%
4 239
 
7.9%
6 201
 
6.7%
5 197
 
6.5%
0 185
 
6.1%
9 154
 
5.1%
8 149
 
4.9%
Other values (25) 624
20.7%
ValueCountFrequency (%)
3 368
12.2%
1 349
11.6%
2 311
10.3%
7 239
 
7.9%
4 235
 
7.8%
6 208
 
6.9%
0 207
 
6.9%
5 194
 
6.4%
9 171
 
5.7%
8 133
 
4.4%
Other values (22) 605
20.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3018
100.0%
ValueCountFrequency (%)
(unknown) 3020
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 370
12.3%
1 355
11.8%
2 302
10.0%
7 242
 
8.0%
4 239
 
7.9%
6 201
 
6.7%
5 197
 
6.5%
0 185
 
6.1%
9 154
 
5.1%
8 149
 
4.9%
Other values (25) 624
20.7%
ValueCountFrequency (%)
3 368
12.2%
1 349
11.6%
2 311
10.3%
7 239
 
7.9%
4 235
 
7.8%
6 208
 
6.9%
0 207
 
6.9%
5 194
 
6.4%
9 171
 
5.7%
8 133
 
4.4%
Other values (22) 605
20.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct189178
Distinct (%)42.4%39.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.07701733.698981
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros611
Zeros (%)1.3%2.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:14.164291image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.05105
Q17.9257.8958
median13.860414.4542
Q33031.359375
95-th percentile110.8833130.2375
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.07523.463575

Descriptive statistics

 Dataset ADataset B
Standard deviation46.18838551.455387
Coefficient of variation (CV)1.48625541.5269122
Kurtosis32.2917122.770471
Mean31.07701733.698981
Median Absolute Deviation (MAD)6.61047.225
Skewness4.58124184.0133961
Sum13860.3515029.746
Variance2133.36692647.6568
MonotonicityNot monotonicNot monotonic
2025-03-05T15:19:14.329695image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 25
 
5.6%
7.75 20
 
4.5%
7.8958 20
 
4.5%
13 18
 
4.0%
26 15
 
3.4%
10.5 14
 
3.1%
7.25 10
 
2.2%
7.925 9
 
2.0%
7.775 8
 
1.8%
7.225 8
 
1.8%
Other values (179) 299
67.0%
ValueCountFrequency (%)
8.05 25
 
5.6%
13 23
 
5.2%
7.75 19
 
4.3%
7.8958 17
 
3.8%
26 14
 
3.1%
0 11
 
2.5%
10.5 11
 
2.5%
7.925 10
 
2.2%
26.55 10
 
2.2%
7.2292 9
 
2.0%
Other values (168) 297
66.6%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 11
2.5%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 11
2.5%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8385
Distinct (%)86.5%81.7%
Missing350342
Missing (%)78.5%76.7%
Memory size7.0 KiB7.0 KiB
2025-03-05T15:19:14.713147image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.61458333.6634615
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters347381
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7169 ?
Unique (%)74.0%66.3%

Sample

 Dataset ADataset B
1st rowC54C93
2nd rowC2F E69
3rd rowD36E17
4th rowE67D35
5th rowF G73B101
ValueCountFrequency (%)
c23 3
 
2.7%
c25 3
 
2.7%
c27 3
 
2.7%
c93 2
 
1.8%
d35 2
 
1.8%
b28 2
 
1.8%
b22 2
 
1.8%
b96 2
 
1.8%
b98 2
 
1.8%
d26 2
 
1.8%
Other values (82) 89
79.5%
ValueCountFrequency (%)
c23 4
 
3.3%
c25 4
 
3.3%
c27 4
 
3.3%
b96 3
 
2.4%
b98 3
 
2.4%
b58 2
 
1.6%
b60 2
 
1.6%
b20 2
 
1.6%
f4 2
 
1.6%
c124 2
 
1.6%
Other values (84) 95
77.2%
2025-03-05T15:19:15.164031image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 43
12.4%
C 35
 
10.1%
3 29
 
8.4%
B 28
 
8.1%
1 25
 
7.2%
6 22
 
6.3%
5 21
 
6.1%
8 20
 
5.8%
7 17
 
4.9%
16
 
4.6%
Other values (8) 91
26.2%
ValueCountFrequency (%)
C 40
10.5%
2 38
 
10.0%
B 35
 
9.2%
3 33
 
8.7%
1 27
 
7.1%
6 26
 
6.8%
5 24
 
6.3%
4 22
 
5.8%
9 19
 
5.0%
19
 
5.0%
Other values (8) 98
25.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 347
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 43
12.4%
C 35
 
10.1%
3 29
 
8.4%
B 28
 
8.1%
1 25
 
7.2%
6 22
 
6.3%
5 21
 
6.1%
8 20
 
5.8%
7 17
 
4.9%
16
 
4.6%
Other values (8) 91
26.2%
ValueCountFrequency (%)
C 40
10.5%
2 38
 
10.0%
B 35
 
9.2%
3 33
 
8.7%
1 27
 
7.1%
6 26
 
6.8%
5 24
 
6.3%
4 22
 
5.8%
9 19
 
5.0%
19
 
5.0%
Other values (8) 98
25.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 347
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 43
12.4%
C 35
 
10.1%
3 29
 
8.4%
B 28
 
8.1%
1 25
 
7.2%
6 22
 
6.3%
5 21
 
6.1%
8 20
 
5.8%
7 17
 
4.9%
16
 
4.6%
Other values (8) 91
26.2%
ValueCountFrequency (%)
C 40
10.5%
2 38
 
10.0%
B 35
 
9.2%
3 33
 
8.7%
1 27
 
7.1%
6 26
 
6.8%
5 24
 
6.3%
4 22
 
5.8%
9 19
 
5.0%
19
 
5.0%
Other values (8) 98
25.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 347
100.0%
ValueCountFrequency (%)
(unknown) 381
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 43
12.4%
C 35
 
10.1%
3 29
 
8.4%
B 28
 
8.1%
1 25
 
7.2%
6 22
 
6.3%
5 21
 
6.1%
8 20
 
5.8%
7 17
 
4.9%
16
 
4.6%
Other values (8) 91
26.2%
ValueCountFrequency (%)
C 40
10.5%
2 38
 
10.0%
B 35
 
9.2%
3 33
 
8.7%
1 27
 
7.1%
6 26
 
6.8%
5 24
 
6.3%
4 22
 
5.8%
9 19
 
5.0%
19
 
5.0%
Other values (8) 98
25.7%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
322 
C
82 
Q
40 
S
320 
C
88 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowCS
3rd rowSS
4th rowSS
5th rowQS

Common Values

ValueCountFrequency (%)
S 322
72.2%
C 82
 
18.4%
Q 40
 
9.0%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 320
71.7%
C 88
 
19.7%
Q 37
 
8.3%
(Missing) 1
 
0.2%

Length

2025-03-05T15:19:15.251579image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-05T15:19:15.305630image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:15.351660image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 322
72.5%
c 82
 
18.5%
q 40
 
9.0%
ValueCountFrequency (%)
s 320
71.9%
c 88
 
19.8%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 322
72.5%
C 82
 
18.5%
Q 40
 
9.0%
ValueCountFrequency (%)
S 320
71.9%
C 88
 
19.8%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 322
72.5%
C 82
 
18.5%
Q 40
 
9.0%
ValueCountFrequency (%)
S 320
71.9%
C 88
 
19.8%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 322
72.5%
C 82
 
18.5%
Q 40
 
9.0%
ValueCountFrequency (%)
S 320
71.9%
C 88
 
19.8%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 322
72.5%
C 82
 
18.5%
Q 40
 
9.0%
ValueCountFrequency (%)
S 320
71.9%
C 88
 
19.8%
Q 37
 
8.3%

Interactions

Dataset A

2025-03-05T15:19:07.526407image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:10.008064image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:05.945955image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.440090image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.300651image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.909550image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.789273image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.268051image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.168001image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.644434image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.593277image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:10.074923image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.013228image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.504223image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.376202image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.979313image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.864241image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.342215image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.234927image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.715044image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.668910image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:10.148622image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.088271image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.575840image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.456190image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.051383image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.938841image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.412652image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.310699image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.788688image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.742598image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:10.226053image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.164376image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.649635image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.532393image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.122397image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.016952image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.492512image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.386971image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.865409image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.814167image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:10.297249image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.231980image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:08.843040image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:06.711480image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.196524image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.093220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.569453image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-05T15:19:07.457267image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:09.936863image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-05T15:19:15.408286image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-05T15:19:15.521394image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.119-0.2130.0330.2090.181-0.1690.169
Embarked0.0001.0000.1440.0800.0000.2450.1660.0790.167
Fare0.1190.1441.0000.439-0.0680.4690.1690.4320.250
Parch-0.2130.0800.4391.000-0.0280.0000.2490.4860.159
PassengerId0.0330.000-0.068-0.0281.0000.0000.000-0.0750.000
Pclass0.2090.2450.4690.0000.0001.0000.0840.1230.307
Sex0.1810.1660.1690.2490.0000.0841.0000.2180.527
SibSp-0.1690.0790.4320.486-0.0750.1230.2181.0000.191
Survived0.1690.1670.2500.1590.0000.3070.5270.1911.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.093-0.216-0.0250.2590.051-0.2310.100
Embarked0.0001.0000.2140.0000.0000.2360.1020.1070.187
Fare0.0930.2141.0000.426-0.0060.4540.2310.4390.347
Parch-0.2160.0000.4261.0000.0150.0000.2720.4870.158
PassengerId-0.0250.000-0.0060.0151.0000.0330.154-0.0740.150
Pclass0.2590.2360.4540.0000.0331.0000.1040.1400.351
Sex0.0510.1020.2310.2720.1540.1041.0000.1720.535
SibSp-0.2310.1070.4390.487-0.0740.1400.1721.0000.175
Survived0.1000.1870.3470.1580.1500.3510.5350.1751.000

Missing values

Dataset A

2025-03-05T15:19:07.926649image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-05T15:19:10.412694image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-05T15:19:08.018842image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-05T15:19:10.505530image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-05T15:19:08.119821image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-05T15:19:10.599828image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
41741812Silven, Miss. Lyyli Karoliinafemale18.00225065213.0000NaNS
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
81982003Skoog, Master. Karl Thorstenmale10.03234708827.9000NaNS
15015102Bateman, Rev. Robert Jamesmale51.000S.O.P. 116612.5250NaNS
38838903Sadlier, Mr. MatthewmaleNaN003676557.7292NaNQ
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
62262313Nakid, Mr. Sahidmale20.011265315.7417NaNC
31031111Hays, Miss. Margaret Bechsteinfemale24.0001176783.1583C54C
20220303Johanson, Mr. Jakob Alfredmale34.00031012646.4958NaNS
60360403Torber, Mr. Ernst Williammale44.0003645118.0500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
69269313Lam, Mr. AlimaleNaN00160156.4958NaNS
959603Shorney, Mr. Charles JosephmaleNaN003749108.0500NaNS
75475512Herman, Mrs. Samuel (Jane Laver)female48.01222084565.0000NaNS
394013Nicola-Yarred, Miss. Jamilafemale14.010265111.2417NaNC
262703Emir, Mr. Farred ChehabmaleNaN0026317.2250NaNC
22422511Hoyt, Mr. Frederick Maxfieldmale38.0101994390.0000C93S
42242303Zimmerman, Mr. Leomale29.0003150827.8750NaNS
12812913Peter, Miss. AnnafemaleNaN11266822.3583F E69C

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
86786801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S
13613711Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S
50250303O'Sullivan, Miss. Bridget MaryfemaleNaN003309097.6292NaNQ
30830902Abelson, Mr. Samuelmale30.010P/PP 338124.0000NaNC
33833913Dahl, Mr. Karl Edwartmale45.00075988.0500NaNS
57257311Flynn, Mr. John Irwin ("Irving")male36.000PC 1747426.3875E25S
56656703Stoytcheff, Mr. Iliamale19.0003492057.8958NaNS
32832913Goldsmith, Mrs. Frank John (Emily Alice Brown)female31.01136329120.5250NaNS
44945011Peuchen, Major. Arthur Godfreymale52.00011378630.5000C104S
73673703Ford, Mrs. Edward (Margaret Ann Watson)female48.013W./C. 660834.3750NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
44244303Petterson, Mr. Johan Emilmale25.0103470767.7750NaNS
22822902Fahlstrom, Mr. Arne Jonasmale18.00023617113.0000NaNS
44344412Reynaldo, Ms. Encarnacionfemale28.00023043413.0000NaNS
19819913Madigan, Miss. Margaret "Maggie"femaleNaN003703707.7500NaNQ
14114213Nysten, Miss. Anna Sofiafemale22.0003470817.7500NaNS
49349401Artagaveytia, Mr. Ramonmale71.000PC 1760949.5042NaNC
54754812Padro y Manent, Mr. JulianmaleNaN00SC/PARIS 214613.8625NaNC
56156203Sivic, Mr. Huseinmale40.0003492517.8958NaNS
80280311Carter, Master. William Thornton IImale11.012113760120.0000B96 B98S
46146203Morley, Mr. Williammale34.0003645068.0500NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.